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Abstract 

The  purpose  of  this  study  was  to  conduct  a  limited  review  of  literature  published  between 
January  1986  and  May  2001  concerning  the  accuracy  and  reliability  of  screening  and  diagnostic 
tests  in  polygraph,  medicine,  and  psychology.  Out  of  the  5,189  hits  produced  by  the  literature 
search,  1,158  articles  and  abstracts  were  reviewed,  145  were  found  to  be  useful  resulting  in  data 
on  198  studies.  For  field  screening  assessments,  the  sensitivity  of  polygraph,  medical,  and 
psychological  tools  was  .59,  .79,  and  .74  respectively.  Specificity  of  polygraph,  medical,  and 
psychological  screening  was  .90,  .94,  and  .78.  For  field  diagnostic  assessments,  the  sensitivity  of 
polygraph,  medical,  and  psychological  tools  was  .92,  .83,  and  .72.  Specificity  of  polygraph, 
medical,  and  psychological  diagnostic  testing  was  .83,  .88,  and  .67  respectively.  Agreement  was 
measured  with  kappa.  Among  readers  in  polygraph,  medicine,  and  psychology  kappa  was  .77, 
.56,  and  .79  respectively.  Reports  in  the  literature  of  polygraph's  accuracy  and  reliability 
(agreement)  on  specific  issues  appear  to  be  consistent  with  published  studies  on  medical  and 
psychological  assessment  tools.  However,  there  is  an  enormous  range  of  accuracy  and 
agreement  not  only  within  polygraph  but  also  medicine  and  psychology.  Although  there  were 
very  few  polygraph  screening  studies,  accuracy  reports  were  lower  than  those  in  medicine  and 
psychology. 


An  Executive  Summary  is  available  on  page  26  of  this  report. 
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Introduction 

The  purpose  of  this  study  was  to  conduct  a  limited  review  of  the  literature  concerning  the 
accuracy  and  reliability  of  screening  and  diagnostic  tests  in  polygraph,  medicine,  and 
psychology.  Measures  in  common  use  today  for  evaluating  assessment  tools  assume  perfection 
is  the  benchmark  of  a  tool's  efficacy.  This  inevitably  causes  disappointment  in  the  performance 
of  assessment  tools,  since  they  rarely  produce  100%  accuracy  or  reliability  unless  significant 
tradeoffs  are  made.  The  premise  of  this  study  is  that  something  less  than  perfection  is  the 
common  outcome  of  assessment  tool  studies.  What  follows  is  an  effort  to  put  the  reported 
accuracy  and  reliability  of  polygraph  in  context  with  studies  from  the  medical  and  psychological 
literature.  It  is  important  to  recognize  that  comparing  assessment  tools  across  different 
disciplines  and  technologies  will  not  clarify  whether  or  not  polygraph  is  an  accurate  or  reliable 
means  for  detecting  truth  and  deception.  It  does,  however,  place  the  less  than  perfect 
performance  of  polygraph  along  side  other  commonly  used  diagnostic  and  screening  tools. 

The  literature  review  focuses  on  the  validity  (accuracy)  and  reliability  (agreement)  of 
polygraph  as  it  compares  to  other  assessment  tools  outside  the  framework  of  the  detection  of 
deception.  The  primary  focus  is  on  common  medical  (diagnostic  radiology)  assessment  tools 
such  as  ultrasound  (US),  x-rays,  computed  tomography  (CT),  and  magnetic  resonance  imaging 
(MRI)  along  with  psychological  assessment  tools  such  as  the  Minnesota  Multiphasic  Personality 
Inventory  (MMPI)  and  the  Diagnostic  and  Statistical  Manual  of  Mental  Disorders  (DSM-III  and 
IV).  Polygraph's  approach  involves  a  human  reader  using  technology  to  measure  and  interpret 
physiological  conditions  and  responses  to  make  a  diagnosis.  This  process  is  very  similar  to  the 
mechanics  involved  in  diagnostic  radiology.  This  connection  between  reader,  technology,  and 
examinee  is  less  evident  in  the  psychological  literature  where  many  of  the  assessment  tools  are 
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paper  and  pencil  tests  with  established  scoring  systems  that  require  no  reader  interpretation.  An 
effort  was  made  to  locate  assessment  tools  in  education  and  personnel  screening  but  few  were 
found  that  either  a)  reported  data  that  were  comparable  or  b)  were  not  already  incorporated  into 
the  psychology  literature. 

A  very  utilitarian  approach  was  used  in  gathering  data  for  this  comparative  analysis. 
Study  selection  depended  on  whether  there  was  sufficient  information  in  an  abstract  to  a)  use  in 
the  analysis  or  b)  would  lead  this  investigator  to  believe  usable  data  could  be  obtained  from  the 
article  text.  What  follows  is  a  very  specific  delineation  of  the  methodology  used.  Consumers  of 
this  report  should  be  warned  that  the  summary  measures  calculated  from  this  set  of  research 
studies  may  be  different  if  another  selection  of  studies  is  used. 
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Method 

In  addition  to  information  provided  by  the  Department  of  Defense  Polygraph  Institute 
(DoDPI)  research  staff,  a  literature  search  was  conducted  through  MEDLINE  of  the  National 
Library  of  Medicine,  Psyclnfo  Direct  of  the  American  Psychological  Association,  and  an  index 
of  polygraph  studies  on  the  National  Polygraph  Consultants  website.  The  search  was  limited  to 
research  published  in  the  past  fifteen  years,  but  some  results  from  review  articles  include  earlier 
studies.  The  keywords  used  in  the  literature  search  are  listed  in  Table  1. 


Table  1:  Search  Keywords 


screening 

multi-rater 

screening  test 

test  reliability 

screening  evaluation 

personnel  testing 

screening  techniques 

security  screening 

screening  accuracy 

psychological  testing 

diagnostic 

MMPI 

diagnostic  test 

validity 

diagnostic  evaluation 

accuracy 

diagnostic  techniques 

sensitivity 

diagnostic  accuracy 

specificity 

reliability 

area  under  the  curve 

agreement 

receiver  operating  characteristic  curve 

kappa 

ROC 

percent  agreement 
rater  agreement 

test  validity 

DoDPI  staff  provided  assistance  in  collecting  hard  copy  versions  of  polygraph  articles 
and  also  gave  guidance  on  relevant  polygraph  studies  not  found  during  the  literature  search.  The 
polygraph  studies  were  reviewed  first  to  identify  measures  of  accuracy  and  agreement  that  would 
be  common  to  all  three  professions.  Accuracy  and  agreement  measures  were  reported  in  several 
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different  forms  and  in  several  cases  had  to  be  converted  to  maintain  consistency.  Table  2  lists 
the  common  measures  used  for  this  review.  These  terms  are  more  common  in  the  medical 
literature  but  their  meanings  are  directly  transferable  to  polygraph  and  psychology. 


Table  2:  Summary  of  common  terms  used  and  not  used  in  this  study. 


Terms  Used 

Definition 

Also  Known  As 

Sensitivity  (Se) 

The  proportion  of  diseased  cases  with  a 
positive  test,  (perfect  accuracy  =  1.0) 

True  Positive  Rate  (TPR) 

Specificity  (Sp) 

The  proportion  of  non-diseased  cases  with 
a  negative  test,  (perfect  accuracy  =  1.0) 

True  Negative  Rate  (TNR) 

Total  Accuracy 

(Se+Sp)/2  (perfect  accuracy  =  1.0) 

Lykken's  formula 

Percent  Agreement 

The  proportion  of  all  readings  conducted 
by  two  readers  in  which  their 
interpretations  agreed. 

Kappa  (k) 

Coefficient  representing  agreement 

obtained  between  two  readers  beyond 
chance.  A  value  of  1  represents  perfect 
agreement.  A  value  of  0  represents  no 
agreement. 

Terms  Not  Used 

Definition 

Also  Known  As 

False  Positive  Rate 

The  proportion  of  non-diseased  cases  with 
a  positive  test,  (perfect  accuracy  =  0.0) 

1 -Specificity 

False  Negative  Rate 

The  proportion  of  diseased  cases  with  a 
negative  test,  (perfect  accuracy  =  0.0) 

1 -Sensitivity 

Total  Accuracy 

#  of  correct  interpretations  -s-  #  of  total 
interpretations 
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The  primary  measures  used  in  this  report  are  sensitivity,  specificity,  and  kappa. 
Sensitivity  reflects  the  proportion  of  diseased  cases  correctly  identified  by  an  assessment  tool.  In 
polygraph,  disease  is  analogous  to  deception.  A  sensitivity  of  1 .0  indicates  the  tool  correctly 
identifies  100%  of  cases  with  the  target  condition.  Specificity  reflects  the  proportion  of  non- 
diseased  (truthful)  cases  correctly  identified  by  an  assessment  tool.  A  specificity  of  1.0  indicates 
the  tool  correctly  identifies  100%  of  cases  without  the  target  condition.  Sensitivity  and 
specificity  are  also  averaged  to  give  one  combined  estimate  of  accuracy.  The  combined 
accuracy  measure  is  the  same  as  that  advocated  by  Lykken  (1983)  and  used  in  a  review  of 
polygraph  studies  by  McCauley  and  Forman  (1988).  The  fourth  measure  is  kappa.  Kappa  is  a 
coefficient  that  represents  agreement  obtained  between  two  readers  beyond  what  would  be 
expected  by  chance  alone.  A  value  of  1.0  represents  perfect  agreement.  A  value  of  0.0 
represents  no  agreement.  Kappa  can  also  range  to  -1.0  (perfect  disagreement)  but  there  are  no 
negative  kappas  reported  in  this  study.  A  list  of  common  terms  there  are  not  used  is  also 
provided  in  Table  2. 

Upon  review,  a  study  was  categorized  as  either  analog  (laboratory)  or  field-based  (actual 
cases)  and  whether  they  were  measuring  an  assessment  tool  in  a  screening  or  diagnostic 
application.  Screening  applications  involve  the  use  of  an  assessment  tool  on  a  general 
population  in  which  there  is  no  specific  evidence  of  disease.  As  an  example,  screening 
mammography  is  routinely  used  on  asymptomatic  women  in  the  hope  of  finding  disease  at  an 
early  stage.  Diagnostic  correlates  with  the  polygraph  specific  issue  test  and  is  reserved  for 
studies  where  there  is  prior  evidence  a  condition  exists,  such  as  when  a  test  is  ordered  after  a 
clinical  examination  of  a  patient  suggests  an  abnormality.  As  an  example,  diagnostic 

1  Fleiss,  J.L.  (1981),  Statistical  Methods  for  Rates  and  Proportions.  John  Wiley,  New  York. 

Landis,  J.R  and  G.G.  Koch.  (1977).  "The  Measurement  of  Observer  Agreement  for  Categorical  Data."  Biometrics 
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mammography  is  used  on  symptomatic  women;  those  who  have  discovered  a  lump  or  other 
abnormality  in  the  breast. 

Abstracts  were  reviewed  for  evidence  of  comparable  measures  of  accuracy  and 
agreement.  Exploratory  studies,  newsletters,  commentaries,  non-established  scales,  duplications, 
and  studies  that  were  unlikely  to  produce  appropriate  statistics  were  avoided.  Agreement  studies 
that  didn't  compare  interpretations  between  two  or  more  human  raters  were  not  used.  If  accuracy 
or  agreement  were  reported  separately  by  various  control  groups  (sex,  race,  age)  an  effort  was 
made  to  calculate  an  average.  Scales  that  did  not  have  an  established  cutoff  for  disease  were  not 
used.  Only  studies  investigating  a  procedure  in  common  use  were  used.  This  was  determined  by 
words  and  phrases  in  the  text  such  as  "preliminary,"  "could  be  used,"  "potential  for."  When 
accuracy  was  presented  for  both  a  newly  proposed  versus  old  established  technique,  only  the 
data  for  the  established  technique  were  used.  Studies  involving  the  use  of  two  or  more 
procedures  to  form  a  decision  and  studies  designed  to  stage  the  progression  of  known  disease 
were  also  excluded.  Medical  studies  involving  invasive  scopes  were  not  used;  nor  did  this 
review  include  any  medical  diagnostic  tests  outside  radiology,  such  as  pathology  or  cardiology. 
When  accuracy  was  reported  at  several  cutoffs,  the  first  disease  cutoff  was  used  if  there  was  no 
other  indication  of  recommended  practice.  Contrary  to  the  review  conducted  by  the  Office  of 
Technology  Assessment  (OTA)  in  1983,  inconclusive  results  were  not  used  in  the  accuracy 
estimates.  Although  inconclusives  were  rarely  mentioned  in  the  medical  and  psychological 
literature,  when  they  were  mentioned,  they  were  explicitly  excluded  from  the  accuracy  estimates. 
Inconclusive  interpretations  were  used  for  agreement  statistics  when  the  data  were  available. 

Data  collected  on  accuracy,  agreement,  number  of  subjects,  and  number  of  studies  were 
entered  into  a  spreadsheet.  These  data  were  double-verified  for  accuracy.  The  spreadsheet  was 
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used  to  sort  studies  and  quantify  summary  measures.  A  mean,  median,  minimum,  and  maximum 
value  were  calculated  to  summarize  the  overall  results  of  the  screening  and  diagnostic  studies 
found  in  this  inquiry.  This  approach  is  similar  to  prior  reviews.  No  statistical  analysis  was 
conducted  nor  is  it  recommended. 


Weaknesses 

Before  continuing,  the  results  of  this  review  should  be  put  into  context  by  clearly  noting 
several  weaknesses  in  both  study  design  and  application.  This  report  contains  a  fifteen-year 
snapshot  of  three  literature  domains,  not  definitive  estimates  of  diagnostic  test  performance. 
Therefore  the  greatest  concern  is  overgeneralization  of  the  results  beyond  their  simple  intent  to 
frame  the  science  of  accuracy  and  reliability.  In  addition,  this  review  should  be  viewed  with  the 
following  caveats  in  mind: 

1.  This  is  not  a  systematic  review  of  the  literature  in  polygraph,  medicine,  or  psychology. 
Specific  rules  were  followed  to  collect  examples  of  the  relevant  body  of  literature,  but 
there  was  also  a  very  utilitarian  perspective  taken  in  obtaining  a  sampling  of  accuracy  and 
agreement  reports  on  commonly  known  assessment  tools. 

2.  The  summary  statistics  reported  for  polygraph,  medicine,  and  psychology  should  not  be 
interpreted  as  generalizable  to  all  assessment  tools  or  applications  within  these 
professions.  The  summary  statistics  are  simply  a  method  of  conveying  the  central 
tendency  and  variation  of  accuracy  and  reliability  estimates  reported  in  the  literature. 
They  are  not  statements  of  accuracy  for  a  particular  procedure  or  profession.  There  is 
much  more  that  would  need  to  be  done  to  develop  that  level  of  precision.  As  an  example, 
a  systematic  and  replicable  review  should  include  special  analytical  techniques  such  as 
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meta-analysis,  study  quality  scoring,  exclusion  of  low  quality  studies,  and  exhaustive 
disease-technology  specific  literature  searches.  This  would  be  an  enormous  undertaking 
that  far  exceeds  the  objective  of  this  study. 

3.  Similar  to  one  of  the  weaknesses  mentioned  above,  this  review  did  not  make  any  effort  to 
determine  the  quality  of  the  research  that  produced  the  statistics  reported  in  the  tables  that 
follow. 

4.  It  should  be  noted  that  the  tools  used  for  polygraph,  medicine,  and  psychology  are  not 
directly  comparable  in  either  their  technology,  application,  or  patient  populations. 
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Results 

The  results  of  this  literature  review  are  separated  into  several  sections.  After  reviewing 
the  results  of  the  search  effort,  the  overall  results  for  screening  and  diagnostic  accuracy  will  be 
presented.  This  will  be  followed  by  a  rank-ordered  comparison  of  accuracy  as  it  relates  to 
common  medical  and  psychological  diseases  (e.g.  appendicitis,  depression).  A  rank-ordered 
comparison  will  also  be  provided  by  assessment  technique  (MRI,  MMPI,  etc.).  The  results 
section  will  conclude  with  a  review  of  reader  agreement. 

Literature  Search 

A  search  for  polygraph  studies  was  conducted  on  April  1,  2001  through  the  index 
provided  by  National  Polygraph  Consultants  (www.nationalpolygraphconsultants.com).  Out  of 
152  articles  found,  42  were  reviewed,  data  from  16  articles  were  used  representing  51  separate 
studies  (see  Table  3).  A  search  for  medical  studies  was  conducted  on  April  30,  2001  through  the 
PubMed  index  (www.ncbi.nlm.nih.gov/PubMed).  The  search  looked  for  keywords  in  both  the 
title  and  abstract  and  covered  the  time  frame  1/1/1986  through  4/30/2001.  Since  there  were  tens 
of  thousands  of  hits  in  PubMed,  the  search  was  refined  to  focus  on  any  keywords  in  the  title  or 
abstract  that  contained  both  a  common  imaging  modality  (plain  film,  mammography,  ultrasound, 
CT,  MRI)  and  kappa,  sensitivity,  specificity,  or  receiver  operating  characteristic  curve  (ROC). 
Abstracts  and/or  articles  from  933  articles  were  reviewed.  Data  from  90  of  these  articles  were 
used  representing  90  separate  studies.  A  search  for  psychological  literature  was  conducted  via 
Psyclnfo  Direct  (http://www.psycinfo.com)  on  April  29,  2001  for  keywords  in  the  abstract.  The 
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search  covered  1/1/1985  to  4/29/2001.  Out  of  3,975  articles  found,  183  were  reviewed,  data 
from  39  articles  were  used  representing  57  separate  studies. 


Table  3:  Search  Results 


Field 

Polygraph 

Medical 

Psychological 

Total 


Hits  Reviewed 


152 

42 

1,065 

933 

3,975 

183 

5,189 

1,158 

Articles 

Studies 

Used 

Reportec 

16 

51 

90 

90 

39 

57 

145 

198 

Although  analog  studies  were  very  common  in  the  polygraph  literature,  none  were  found 
in  the  psychology  literature  and  only  two  were  found  in  the  medical  literature.  As  a  result,  most 
comparisons  mentioned  in  the  report  focus  on  describing  field  polygraph  accuracy  (bolded  in 
tables);  however,  the  tables  also  include  results  for  analog  polygraph  along  with  analog  and  field 
studies  averaged  into  one  combined  accuracy  measure.  Some  articles  reported  more  than  one 
accuracy  or  agreement  estimate.  As  a  result,  the  count  of  studies  provided  in  some  of  the  tables 
may  sum  to  more  than  what  is  reported  in  Table  3.  A  complete  listing  of  the  data  used  in  this 
report  is  contained  in  the  Appendix. 


Accuracy  of  Screening  Techniques 

Five  polygraph  screening  studies  were  found.  Based  on  three  analog  studies,  the  mean 
sensitivity  of  polygraph  screening  (.76)  is  greater  than  that  reported  in  the  two  field  polygraph 
screening  studies  (.59).  Specificity  in  analog  polygraph  screening  studies  (.82)  is  less  than  field 
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screening  studies  (.90).  For  ten  medical  screening  studies,  both  the  mean  sensitivity  (.79)  and 
specificity  (.94)  are  greater  than  polygraph.  Psychology  screening  (36  studies)  reports  have 
greater  sensitivity  (.74)  than  polygraph  but  lower  specificity  (.78). 


Table  4:  Accuracy  of  screening  techniques  in  polygraph,  medicine,  and  psychology 


Polygraph 


Analog 

Field 

Combined* 

Medicine 

Psychology 

Sensitivity  (TPR) 

Mean 

0.76 

0.59 

0.67 

0.79 

0.74 

Median 

0.67 

0.59 

0.63 

0.78 

0.79 

Minimum 

0.61 

0.45 

0.53 

0.51 

0.11 

Maximum 

1.00 

0.73 

0.86 

0.97 

1.00 

Studies 

3 

2 

5 

10 

36 

Specificity  (TNR) 

Mean 

0.82 

0.90 

0.86 

0.94 

0.78 

Median 

0.83 

0.90 

0.87 

0.93 

0.85 

Minimum 

0.63 

0.87 

0.75 

0.87 

0.00 

Maximum 

1.00 

0.93 

0.97 

1.00 

1.00 

Studies 

3 

2 

5 

10 

36 

Combined  Accuracy 

Mean 

0.79 

0.74 

0.77 

0.86 

0.76 

Median 

0.72 

0.74 

0.73 

0.85 

0.78 

Minimum 

0.65 

0.69 

0.67 

0.76 

0.42 

Maximum 

1.00 

0.80 

0.90 

0.99 

0.98 

Studies 

3 

2 

5 

10 

36 

Number  of  Subjects 

Mean 

50 

467 

258 

56,581 

996 

Median 

40 

467 

253 

19,758 

307 

Minimum 

40 

200 

120 

79 

55 

Maximum 

71 

733 

402 

202,070 

16,235 

Studies 

3 

2 

5 

10 

36 

*  (analog  +  field)/2 
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Overall,  the  mean  reported  combined  accuracy  of  screening  polygraph  (.74)  is  similar  to 
screening  psychology  studies  (.76)  but  lower  than  the  mean  combined  screening  accuracy  for 
medical  (.86)  studies.  The  range  between  the  minimum  and  maximum  combined  accuracy 
estimates  from  the  literature  is  very  different  for  polygraph  (.69  to  .80),  medicine  (.76  to  .99), 
and  psychology  (.42  to  .98).  On  average,  polygraph  screening  studies  use  about  half  (467)  as 
many  subjects  as  psychology  (996)  studies  and  far  less  than  reported  for  medical  screening 
studies  (56,581). 


Accuracy  of  Diagnostic  Techniques 

There  are  37  field  polygraph,  94  medical,  and  51  psychology  diagnostic  studies  reported 
in  Table  5.  Mean  sensitivity  and  specificity  reported  in  field  polygraph  diagnostic  studies  are 
greater  than  those  based  on  analog  diagnostic  studies.  The  mean  sensitivity  reported  for 
medicine  (.83)  and  psychology  (.72)  are  lower  than  field  polygraph  (.92)  studies.  Although 
polygraph  field  studies  have  a  mean  specificity  (.83)  that  is  greater  than  psychology  studies  (.67), 
polygraph's  specificity  is  similar  but  lower  than  that  reported  in  the  medical  studies  (.88). 
Overall,  the  mean  combined  diagnostic  accuracy  of  polygraph  (.88)  and  medical  (.86)  studies  are 
very  similar.  The  range  between  the  minimum  and  maximum  combined  accuracy  estimates  from 
the  literature  are  very  similar  for  polygraph  (.64  to  1.0),  medicine  (.60  to  1.0),  and  psychology 
(.50  to  .93).  On  average,  polygraph  diagnostic  studies  use  about  half  (108)  as  many  subjects  as 
medicine  (284)  and  psychology  (218). 
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Table  5:  Accuracy  of  diagnostic  techniques  in  polygraph,  medicine,  and  psychology 


Polygraph 


Analog 

Field 

Combined 

Medicine 

Psychology 

Sensitivity  (TPR) 

Mean 

0.89 

0.92 

0.91 

0.83 

0.72 

Median 

0.92 

0.95 

0.94 

0.85 

0.71 

Minimum 

0.63 

0.71 

0.67 

0.25 

0.37 

Maximum 

1.00 

1.00 

1.00 

1.00 

0.96 

Studies 

18 

37 

55 

94 

51 

Specificity  (TNR) 

Mean 

0.78 

0.83 

0.81 

0.88 

0.67 

Median 

0.79 

0.90 

0.85 

0.93 

0.65 

Minimum 

0.49 

0.43 

0.46 

0.44 

0.41 

Maximum 

0.97 

1.00 

0.99 

1.00 

0.95 

Studies 

18 

37 

55 

94 

51 

Combined  Accuracy 

Mean 

0.84 

0.88 

0.86 

0.86 

0.70 

Median 

0.85 

0.90 

0.87 

0.88 

0.69 

Minimum 

0.60 

0.64 

0.62 

0.60 

0.50 

Maximum 

0.98 

1.00 

0.99 

1.00 

0.93 

Studies 

18 

37 

55 

94 

51 

Number  of  Subjects 

Mean 

72 

108 

90 

284 

218 

Median 

55 

64 

60 

124 

84 

Minimum 

15 

16 

16 

23 

29 

Maximum 

192 

959 

576 

4,811 

1,079 

Studies 

18 

37 

55 

89 

51 

Accuracy  by  Target  Condition 

Table  6  reports  screening  and  diagnostic  accuracy  by  target  condition  and  assessment 
technique  used.  The  list  is  ordered  from  highest  to  lowest  mean  combined  accuracy.  Diagnosing 
acute  appendicitis  with  computed  tomography  (CT)  has  the  greatest  combined  accuracy  (.96). 
Based  on  five  studies,  CT  has  a  sensitivity  of  .95  and  specificity  of  .98  in  the  diagnosis  of  acute 
appendicitis. 
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Table  6:  Rank  ordered  "Combined  Accuracy"  on  common  medical  &  psychological  diseases 

Average  Accuracy 


Target  Condition 

Technique 

Sensitivity 

(TPR) 

Specificity 

(TNR) 

Combined 

Accuracy 

Number  of 
Studies 

Acute  Appendicitis 

CT 

0.95 

0.98 

0.96 

5 

Brain  Tumor 

MRI 

0.93 

0.98 

0.95 

2 

Carotid  Artery  Disease 

US 

0.89 

0.93 

0.91 

14 

Acute  Appendicitis 

US 

0.84 

0.97 

0.91 

2 

Breast  Cancer 

US 

0.92 

0.87 

0.90 

3 

Deception 

Polygraph 

0.92 

0.83 

0.88 

37 

Breast  Cancer 

MRI 

0.98 

0.74 

0.86 

3 

Breast  Cancer  (screen) 

Plain  Film 

0.79 

0.92 

0.86 

4 

Multiple  Sclerosis 

MRI 

0.73 

0.93 

0.83 

2 

Breast  Cancer 

Plain  Film 

0.78 

0.83 

0.80 

7 

Alcohol  Abuse  (screen) 

MAST* 

0.80 

0.78 

0.79 

4 

Deception  (screen) 

Polygraph 

0.59 

0.90 

0.74 

2 

Personality  Disorders 

DSM-IV** 

0.84 

0.60 

0.72 

3 

Depression 

MMPI 

0.68 

0.65 

0.67 

25 

*Also  included  a  study  using  MMPI 

**Also  included  studies  using  ICD-10  and  a  Personality  Index 


Diagnosing  depression  with  the  MMPI  has  the  lowest  mean  combined  accuracy  (.67)  reported  in 
Table  6.  Based  on  37  studies,  diagnostic  field  polygraph  studies  have  an  average  combined 
accuracy  of  .88.  This  is  similar  to  using  ultrasound  to  diagnose  carotid  artery  disease  (.91),  acute 
appendicitis  (.91),  and  breast  cancer  (.90).  It  is  also  similar  to  using  MRI  (.86)  and  plain  film 
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(.86)  to  diagnose  breast  cancer.  The  combined  accuracy  of  screening  polygraph  is  one  of  the 
lowest  reported  in  Table  6. 


Accuracy  by  Evaluation  Tool 

Table  7  reports  accuracy  by  type  of  evaluation  tool.  Similar  to  Table  6,  the  list  is  ordered 
from  highest  to  lowest  mean  combined  accuracy.  Based  on  37  field  studies,  diagnostic  (specific 
issue)  polygraph  has  the  highest  combined  accuracy  (.88).  Overall,  however,  the  combined 
diagnostic  accuracy  reported  in  field  polygraph  studies  is  very  similar  to  those  reported  in  MRI 
(.87),  CT  (.86),  and  ultrasound  (.86)  diagnostic  studies.  The  MMPI,  either  screening  (.61)  or 
diagnostic  (.67),  has  the  lowest  average  combined  accuracy. 


Table  7:  Rank  ordered  "Combined  Accuracy"  of  diagnostic  &  screening  tools 


Evaluation  Tool 

Polygraph 

MRI 

CT 

US 

Plain  Film 

MAST  (screening) 

Polygraph  (screening) 

DSM-IV 

MMPI 

MMPI  (screening) 


Average  Accuracy 


Sensitivity 

(TPR) 

Specificity 

(TNR) 

0.92 

0.83 

0.86 

0.88 

0.83 

0.89 

0.84 

0.87 

0.77 

0.85 

0.64 

0.92 

0.59 

0.90 

0.72 

0.68 

0.68 

0.65 

0.70 

0.53 

Combined 

Accuracy 

Number  of 
Studies 

0.88 

37 

0.87 

17 

0.86 

19 

0.86 

38 

0.81 

12 

0.78 

3 

0.74 

2 

0.70 

1 

0.67 

17 

0.61 

5 
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Inter-Rater  Agreement 

Agreement  among  raters  is  measured  as  either  the  percent  of  cases  in  which  two  raters 
agree  on  an  interpretation  or  the  proportion  of  agreement  beyond  that  expected  by  chance,  which 
is  represented  by  the  kappa  coefficient.  Although  these  are  very  common  measures  of 
agreement,  neither  of  these  measures  were  reported  often  in  the  literature  reviewed  for  this  study. 
As  an  example,  there  was  only  one  study  gathered  in  the  search  of  psychology  literature  that 
reported  between  rater  percent  agreement.  All  agreement  studies  reported  in  Table  8  are  based 
on  field  studies.  There  were  only  three  screening  studies  found  that  reported  agreement  data  and 
these  were  all  polygraph.  There  were  no  analog  studies  found. 

The  eight  polygraph  studies  reporting  percent  agreement  averaged  91%  among  polygraph 
examiners  compared  to  81%  for  physicians  (based  on  five  studies).  Kappa  coefficients  were 
found  in  all  three  disciplines.  Based  on  six  studies  in  the  psychology  literature,  the  mean  kappa 
among  psychologists  is  .79.  This  is  similar  to  polygraph  examiners  (.77)  but  greater  than  reports 
on  physicians  (.56).  It  is  important  to  note  that  kappa  is  a  chance  corrected  measure.  This  means 
that  the  kappa  coefficient  depends  on  both  agreement  and  the  distribution  of  cases  used  in  a 
particular  study.  Two  studies  with  identical  percent  agreements  can  have  dramatically  different 
kappas  if  the  distribution  of  subject  diagnoses  vary  (proportion  of  subjects  with  and  without 
disease).  As  a  result,  it  is  very  difficult  to  compare  kappa  from  one  study  to  the  next  either 
within  the  same  discipline  or  between  two  disciplines. 
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Table  8:  Inter-rater  agreement  on  diagnostic  cases  among  polygraph  examiners, 
physicians,  and  psychologists _ 


Percent  Agree 


Kappa  (bi-rater) 


Number  of  Subjects 


Polygraph 

Examiners 


Mean  91% 

Median  91% 

Minimum  77% 

Maximum  100% 

Studies  8 

Mean  0.77 

Median  0.80 

Minimum  0.53 

Maximum  1.00 

Studies  9 

Mean  102 

Median  69 

Minimum  21 

Maximum  402 

Studies  9 


Physicians 

Psychologists 

81% 

88% 

80% 

88% 

77% 

88% 

85% 

88% 

5 

1 

0.56 

0.79 

0.60 

0.79 

0.34 

0.64 

0.72 

0.91 

13 

6 

150 

174 

138 

113 

41 

76 

308 

331 

14 

6 
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Conclusion 

The  purpose  of  this  study  was  to  conduct  a  limited  review  and  analysis  of  the  literature 
concerning  the  accuracy  and  reliability  of  screening  and  diagnostic  tests  in  polygraph,  medicine, 
and  psychology.  Out  of  the  5,189  hits  produced  by  the  literature  search,  1,158  articles  and 
abstracts  were  reviewed,  145  were  found  to  be  useful  resulting  in  data  on  198  studies.  The 
results  of  this  review  have  shown  there  is  an  enormous  range  in  reports  of  accuracy  and 
agreement  not  only  in  polygraph  but  also  medicine  (limited  to  diagnostic  radiology)  and 
psychology.  Overall,  polygraph  research  on  specific  issue  tests  reports  accuracy  results  similar 
to  medicine.  In  contrast,  polygraph  screening  studies  report  lower  accuracy  than  medical  studies 
but  are  similar  to  what  is  reported  in  the  psychology  literature. 

To  put  these  results  into  perspective,  its  worth  reviewing  several  methodological  issues 
raised  almost  two  decades  ago  in  the  Office  of  Technology  Assessment's  (OTA)  report  on  the 
"Scientific  Validity  of  Polygraph  Testing."  These  issues,  taken  directly  from  the  Conclusion  of 
the  OTA  report,  are  as  follows: 

•  Accuracy  is  affected  by  factors  such  as  reader  training,  experience,  personal  bias,  and 
examinee  characteristics 

•  Cases  and  readers  are  often  selectively  chosen  rather  than  randomly 

•  Criteria  for  ground  truth  are  inadequate  in  some  studies 

•  There  is  wide  variability  in  results  from  multiple  studies 

This  review  found  that  these  same  methodological  deficits  are  very  evident  in  the  medical  and 
psychological  literature.  Polemics  on  polygraph  often  correctly  identify  these  issues  but  either 
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overstate  or  fail  to  mention  that  these  same  problems  afflict  much  of  the  research  in  medicine 
and  psychology. 2 

To  put  the  results  of  this  review  in  context,  Table  9  contrasts  the  average  accuracy  found 
in  field  diagnostic  studies  with  those  reported  in  the  OTA  study  in  1983.  Since  the  OTA  study 
included  inconclusive  results  in  accuracy  estimates  (and  this  study  does  not),  it  is  not  surprising 
that  this  current  literature  review  reports  a  somewhat  greater  level  of  accuracy.  As  can  be  seen, 
however,  the  level  of  polygraph  accuracy  found  by  OTA  is  in  the  same  "ballpark"  as  that 
reported  in  medicine  and  psychology. 


Table  9:  OTA  findings  compared  to  study  results  (field  diagnostic  cases) 

Average  Accuracy 


Aggregate  Measures 

Sensitivity 

(TPR) 

Specificity 

(TNR) 

Combined 

Accuracy 

Number  of 
Studies 

OTA  Findings  (w/Incl.) 

.86 

.76 

.81 

10 

Current  Polygraph  Findings 

.92 

.83 

.88 

37 

Medicine 

.83 

.88 

.86 

94 

Psychology 

.72 

.67 

.70 

51 

The  findings  presented  in  Table  9  are  consistent  with  OTA's  conclusion  that  research  into 
specific  issue  polygraph  testing  has  shown  the  technique  has  some  validity.  It  does  not, 
however,  answer  the  question  of  polygraph  validity  in  screening  tests.  Although  not  very 
extensive,  this  review  reports  three  analog  and  two  field  screening  polygraph  studies.  Table  10 
puts  these  alongside  what  was  found  for  medical  and  psychological  screening.  As  can  be  seen, 
the  mean  combined  accuracy  of  medical  screening  (.86)  is  greater  than  the  mean  reported  for 
analog  or  field  polygraph  screening  studies.  Both  field  (.74)  and  analog  (.79)  polygraph  studies 

2  For  a  recent  example,  see  Aftergood,  Steven.  "Polygraph  testing  and  the  DOE  National  Laboratories."  SCIENCE 


22 


are  similar  to  psychology  screening  studies  (.76),  but  the  reported  sensitivity  of  field  (.59) 
polygraph  screening  is  noticeably  lower  than  the  mean  found  for  medical  and  psychological 
screening. 


Table  10:  Rank  ordered  findings  for  screening  studies _ 

Average  Accuracy 


Aggregate  Measures 

Sensitivity 

(TPR) 

Specificity 

(TNR) 

Combined 

Accuracy 

Number  of 
Studies 

Medicine  (Field) 

.79 

.94 

.86 

10 

Current  Polygraph  Findings  (Analog) 

.76 

.82 

.79 

3 

Psychology  (Field) 

.74 

.78 

.76 

36 

Current  Polygraph  Findings  (Field) 

.59 

.90 

.74 

2 

Summary 

There  has  been  much  debate  over  the  past  30  years  about  polygraph  and  its  accuracy, 
reliability,  utility,  and  lack  of  theoretical  foundation.  It  should  be  recognized  from  this  literature 
review,  however,  that  many  of  these  same  issues  could  be  raised  about  medical  and 
psychological  diagnostic  tools.  Based  on  the  results  of  this  review,  it  is  unlikely  polygraph 
research  will  be  able  to  reach  a  level  of  accuracy  and  reliability  to  satisfy  its  opponents.  It 
suffers  from  the  same  flaws  of  many  other  diagnostic  tools:  it  will  not  be  100%  accurate,  nor  will 
its  application  from  one  subject  to  the  next  or  by  one  examiner  to  another  be  invariant. 

The  accuracy  of  humans  assessing  humans  is  unlikely  to  be  100%.  As  has  been  shown  in 
this  brief  survey  of  the  medical  and  psychological  literature,  there  is  wide  variation  in  the 
accuracy  of  diagnostic  tools  from  one  application  to  the  next.  In  fact,  there  is  often  wide 
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variation  between  studies  focused  on  one  diagnostic  tool,  such  as  has  been  seen  in  past  polygraph 
reviews.  Since  perfection  remains  elusive,  some  professions  have  learned  to  accept  and  manage 
this  uncertainty.  As  an  example,  training,  standardization,  and  an  ongoing  review  of  procedures 
are  used  to  establish  a  baseline  of  acceptable  practice  and  alternative  mechanisms  are  developed 
and  employed  to  help  clarify  equivocal  test  results. 

Recommendations 

One  of  the  goals  of  this  study  is  to  compare  polygraph  research  methodology  to  that  used 
in  medicine  (diagnostic  radiology)  and  psychology.  The  author  does  not  claim  to  be  an  expert  in 
all  these  methodologies,  so  what  follows  are  general  impressions  from  the  literature  review  and 
personal  experience. 

1.  Room  for  the  "Medical"  Perspective ?  Most  research  found  in  the  medical  profession  uses 
very  specific  terminology  (sensitivity,  specificity,  odds  ratios)  and  other  methodological 
approaches  (receiver  operating  characteristic  curves)  for  assessing  diagnostic  tests.  Greater 
use  of  similar  terminologies  and  methodological  approaches  in  polygraph  may  make  the 
results  of  polygraph  research  more  meaningful  to  those  outside  the  polygraph  and 
psychology  community.  There  may  also  be  a  reservoir  of  methodologies  and  knowledge  in 
the  health  technology  assessment  field  that  could  be  applied  to  polygraph  research. 

2.  Greater  Focus  on  Accuracy  and  Reliability?  The  polygraph  literature  reviewed  for  this  study 
occasionally  delved  more  into  providing  a  sophisticated  analysis  of  process  and  metrics 
instead  of  clarifying  how  the  outcomes  of  these  factors  affect  accuracy  and  reliability.  In 
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several  cases,  basic  measures  of  accuracy  or  reliability  were  not  immediately  apparent 
(sensitivity,  specificity,  and  kappa  had  to  be  hand  calculated  from  frequencies  in  tables). 

3.  Analog  Generalizability?  Although  analog  studies  are  practically  nonexistent  in  the  medical 
and  psychological  literature,  they  appear  to  serve  an  important  role  in  polygraph  research  and 
are  very  valuable  in  assessing  the  internal  validity  of  a  test.  It  is  difficult,  however,  to  see 
how  analog  studies  are  generalizable  to  an  applied  (clinical)  setting.  As  with  medical  and 
psychological  tools,  its  unlikely  polygraph  will  be  able  to  demonstrate  efficacy  with  a  heavy 
reliance  on  laboratory  studies. 

4.  Screening  Studies?  Compared  to  the  number  of  specific  issue  polygraph  studies,  there  were 
relatively  few  screening  studies  available  for  review.  Although  developing  a  suitable  gold 
standard  (ground  truth)  for  an  evaluation  of  field  polygraph  screening  is  a  very  difficult 
problem  to  surmount,  similar  issues  have  been  faced  by  medicine  and  psychology.  Based  on 
the  medical  and  psychological  literature,  there  appears  to  be  a  considerable  range  in  the 
quality  of  gold  standards.  The  point  suggested  from  the  literature  is  that  lack  of  a  "pure"  gold 
standard  has  not  stopped  screening  research  in  psychology  and  medicine. 

5.  Too  Many  Inconclusives?  Although  inconclusive  test  results  were  a  common  element  in 
polygraph  research,  they  were  seldom  mentioned  in  the  medical  and  psychological  literature. 
Any  accuracy  or  reliability  study  which  selects  out  only  obvious  interpretations  will  inflate 
accuracy  estimates  and  threatens  both  the  legitimacy  of  the  research  and  the  assessment 
technique. 


Executive  Summary 

A  Comparative  Analysis  of  Polygraph  with  other  Screening  and  Diagnostic  Tools 


Method 


Accuracy  by  Target  Condition 


A  limited  review  of  literature  published  between  January 
1986  and  May  2001  was  conducted  to  evaluate  studies 
reporting  the  accuracy  and  reliability  of  screening  and 
diagnostic  tests  in  polygraph,  medicine,  and  psychology. 
Data  for  198  studies  were  collected  from  145  articles. 
Accuracy  estimates  are  the  combined  average  of 
sensitivity  and  specificity  across  all  studies  found  within 
a  particular  category  (1.00  =  100%  accuracy). 

Diagnostic  and  Screening  Accuracy 

For  field  diagnostic  assessments,  the  accuracy  of 
polygraph,  medical,  and  psychological  tools  was  .88, 
.86,  and  .70  respectively.  For  field  screening 
assessments,  the  accuracy  of  polygraph,  medical,  and 
psychological  tools  was  .74,  .86,  and  .76  respectively. 


Accuracy  of  Various  Diagnostic  Tools 

The  average  accuracy  reported  for  37  diagnostic 
polygraph  studies  (specific  issue)  was  similar  to  MRI  (17 
studies),  CT  (19  studies),  and  ultrasound  (38  studies). 
MMPI  had  the  lowest  reported  accuracy  (17  studies). 


The  average  diagnostic  accuracy  for  detecting  deception 
with  polygraph  was  similar  to  diagnosing  breast  cancer 
with  MRI  or  ultrasound  (US). 


Agreement  (kappa) 

Averaging  a  standard  measure  of  agreement  across  the 
reviewed  literature  suggests  polygraph  and  psychology 
studies  report  similar  levels  of  agreement  A  kappa 
value  of  1.0  represents  100%  agreement  beyond  what 
would  be  expected  by  chance. 


Conclusion 

The  level  of  accuracy  and  agreement  reported  in  the 
polygraph  literature  is  consistent  with  the  medical  and 
psychological  literature. 
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Polygraph  Studies 


Field  Accuracy  Studies  i 

Screening 

Diagnostic  i 

Se 

Sp 

Total  % 

Cases 

Se 

Sp  _ 

Total  % 

Cases 

Mean 

0.59 

0.90 

0.74 

467 

0.92 

0.83 

0.88 

108 

Median 

0.59 

0.90 

0.74 

467 

0.95 

0.90 

0.90 

64 

Minimum 

0.45 

0.87 

0.69 

200 

0.71 

0.43 

0.64 

16 

Maximum 

0.73 

0.93 

0.80 

733 

1 

1 

1 

959 

Count 

2 

2 

2 

2 

37 

37 

37 

37 

First  Author 

Jayne-screening 

Brownlie 

Bersh-GQT 

Bersh-GQT  &ZOC  majority 

Bersh-GQT  and  ZOC 

Bersh-ZOC 

Horvath 

Hunter 

Slowick 

Wicklander 

Barland-judicial 

Barland-panel 

Raskin-nonnumerical 

Raskin-numerical 

Horvath 

Davidson 

Yamamura  &  Miyake 

Edwards 

Kleinmuntz 

Putnam 

Elaad-blind 

Elaad-blind 

Eladd  &  Schahar 

Yankee-Experienced  examiners-without  incl 
Yankee-Inexperienced  examiners-without  incl 
Patrick 
Patrick-blind 

Honts  &  Driscoll-ranked  scores 

Honts  &  Driscoll-std  num  scores 

Honts-blind 

Raskin 

Raskin-blind 

Franz-blind 

Matte-blind 

Murray 

Arellano-blind 

Patrick  &  lacono 

Krapohl  et  al-specific  issue 

Krapohl  et  al-specific  issue 


Year 

1989  0.45  0.93  0.69  200 

1997  0.73  0.87  0.80  733 

1969 

1969 

1969 

1969 

1971 

1973 

1975 

1975 

1976 
1976 
1976 

1976 

1977 

1979 

1980 

1981 

1982 

1983 
1985 
1985 
1985 
1985 
1985 
1987 

1987 

1988 
1988 
1988 
1988 

1988 

1989 
1989 

1989 

1990 

1991 
2001 
2001 


0.97 

0.89 

0.93 

68 

0.71 

0.80 

0.76 

59 

0.93 

0.92 

0.93 

157 

0.89 

0.94 

0.92 

89 

0.85 

0.91 

0.88 

40 

0.88 

0.86 

0.87 

20 

0.85 

0.93 

0.89 

30 

0.95 

0.93 

0.94 

20 

1.00 

0.43 

0.72 

41 

0.98 

0.45 

0.72 

64 

0.93 

0.69 

0.81 

16 

1.00 

0.95 

0.98 

16 

0.77 

0.51 

0.64 

56 

0.88 

0.81 

0.85 

20 

0.80 

0.94 

0.87 

95 

0.98 

0.98 

0.98 

959 

0.75 

0.63 

0.69 

80 

0.99 

0.95 

0.97 

285 

0.77 

0.77 

0.77 

60 

0.77 

0.90 

0.84 

60 

0.99 

0.95 

0.97 

174 

1.00 

0.99 

1.00 

51 

1.00 

0.99 

1.00 

51 

1.00 

0.90 

0.95 

81 

0.98 

0.55 

0.77 

69 

0.94 

0.65 

0.80 

25 

0.97 

0.77 

0.87 

25 

1.00 

0.80 

0.90 

21 

0.95 

0.96 

0.96 

85 

0.94 

0.86 

0.90 

70 

1.00 

0.97 

0.99 

81 

1.00 

1.00 

1.00 

114 

1.00 

0.86 

0.93 

171 

1.00 

1.00 

1.00 

40 

0.97 

0.56 

0.77 

402 

0.87 

0.92 

0.90 

221 

0.84 

0.97 

0.91 

76 
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Polygraph  Studies 


Analog  Accuracy  Studies  I 

Screening 

Diagnostic 

Se 

Sp 

Total  % 

Cases 

Se 

Sp 

Total  % 

Cases 

Mean 

0.76 

0.82 

0.79 

50 

0.89 

0.78 

0.84 

72 

Median 

0.67 

0.83 

0.72 

40 

0.92 

0.79 

0.85 

55 

Minimum 

0.61 

0.63 

0.65 

40 

0.63 

0.49 

0.60 

15 

Maximum 

1.00 

1.00 

1.00 

71 

1.00 

0.97 

0.98 

192 

Count 

3 

3 

3 

3 

18 

18 

18 

18 

First  Author 

Year 

Correa  &  Adams-preemployment 

1981 

1.00 

1.00 

1.00 

40 

Ansley-screening 

1989 

0.61 

0.83 

0.72 

71 

Honts 

1999 

0.67 

0.63 

0.65 

40 

Barland 

1975 

0.83 

0.71 

0.77 

72 

Podlesny 

1978 

0.81 

0.96 

0.89 

40 

Raskin 

1978 

1.00 

0.95 

0.98 

48 

Rovner 

1978 

0.90 

0.85 

0.88 

72 

Dawson 

1980 

1.00 

0.70 

0.85 

24 

Hammond 

1980 

0.96 

0.67 

0.82 

62 

Brad  ley-electrode  rmal 

1981 

0.82 

0.86 

0.84 

192 

Bradley-heart  rate 

1981 

0.63 

0.63 

0.63 

192 

Szecko 

1981 

0.71 

0.49 

0.60 

30 

G  inton 

1982 

1.00 

0.85 

0.93 

15 

Honts 

1982 

1.00 

0.67 

0.84 

21 

Honts 

1982 

1.00 

0.67 

0.84 

38 

Kischer 

1983 

0.94 

0.97 

0.96 

100 

Honts  &  Driscoll-ranked  scores 

1987 

0.95 

0.86 

0.91 

41 

Honts  &  Driscoll-std  num  scores 

1987 

0.86 

0.86 

0.86 

44 

Barland  et  al-multiple  issue  approach 

1990 

0.93 

0.75 

0.84 

100 

Barland  et  al-single  issue  approach 

1990 

0.91 

0.83 

0.87 

100 

Blackwell-Exa  miners 

1996 

0.86 

0.73 

0.79 

108 

30 


Polygraph  Studjes 


Field  Agreement  Studies  ) 

Screening 

Diagnostic  I 

Agree  % 

Kappa 

Cases 

Agree  % 

Kappa 

Cases 

Mean 

0.98 

53 

0.91 

0.77 

102 

Median 

0.99 

60 

0.91 

0.80 

69 

Minimum 

0.95 

40 

0.77 

0.53 

21 

Maximum 

0.99 

60 

1.00 

1.00 

402 

Count 

3 

0 

3 

8 

9 

9 

First  Author 

Year 

Edel  &  Moore 

1984 

0.95 

40 

Yankee-Experienced  examiners-with  incl 

1985 

0.99 

60 

Yankee-Inexperienced  examiners-with  incl 

1985 

0.99 

60 

Elaad 

1985 

0.77 

0.53 

60 

Elaad 

1985 

0.83 

0.67 

60 

Patrick 

1987 

0.86 

0.60 

69 

Honts 

1988 

0.90 

0.81 

21 

Raskin 

1988 

0.91 

0.80 

70 

Franz 

1989 

0.99 

0.98 

81 

Matte 

1989 

1.00 

1.00 

114 

Arellano 

1990 

1.00 

1.00 

40 

Patrick  &  lacono 

1991 

0.53 

402 
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Medical  Studies 


First  Author 
Baines-breast  cancer 
Unk  author-breast  cancer 
Burhenne-breast  cancer 
Beam-breast  cancer 
Heinzen-breast  cancer 
Mettlin-prostate  cancer 
Levi-congenital  anomalies 
Strandell-endometrial  pathology 
Lennon-neural  tube  and  ventral  wall  defects 
van  Nagell-ovarian  cancer 
Stark-Hepatic  metastases 
Shackford-minor  head  injuries 
Pasanen-unjaundiced  cholestasis 
van  Gils-paraganglioma  of  the  head/neck 
Budoff-coronary  artery  disease 
Rao-  appendicitis 
Mushlin-multiple  sclerosis 
Mushlin-brain  tumor 
Mushlin-cerebrovascular  disease 
D’lppolito-  appendicitis 
Miller-acute  flank  pain 
Vieweg-  acute  flank  pain 
Keberle-  throat  tumors 
Lane-  appendicitis 
Garcia  -  appendicitis 
Valk-colorectal  cancer 
Kurtz-ovarian  cancer 
Walker-  appendicitis 
Joseph-open-globe  injuries 
von  Kummer-stroke  damage 
Stafford-blunt  abdominal  trauma 
Stafford-blunt  abdominal  trauma 
Martelli-breast  cancer 
Elmore-breast  cancer 
Cwikla-breast  cancer 
Fenlon-breast  cancer 
Drew-breast  cancer 
Zonderland-breast  cancer 
Moss-breast  cancer 
Stark-Hepatic  metastases 
Barronian-imaging  of  the  knee 
Glashow-anterior  cruciate  and  meniscal  lesions 
Mooney-multiple  sclerosis 
Mooney-brain  infarct 
Mooney-brain  tumor 
Mooney-other  brain  disease 
Young-carotid  artery  stenosis 
Levine-osteomyelitis 
Ascher-Endometriosis 
Mussurakis-breast  cancer 
Mushlin-multiple  sclerosis 
Mushlin-brain  tumor 
Mushlin-cerebrovascular  disease 
Regan-acute  cholecystitis 
Kurtz-ovarian  cancer 
Drew-breast  lesions 
Blanchard-rotator  cuff  tears 
Razumovsky-acute  cerebral  ischemia 
Adamek-pancreatic  cancer 
Schroter-Creutzfeldt-Jakob  disease 
Imbriaco-  breast  masses 
Scott-orthopedic  fractures 


Field  Accuracy  Studies  1 

Screening 

Diagnostic  1 

Se 

Sp 

Total  % 

Cases 

Se 

Sp 

Total  % 

Cases 

Mean 

0.79 

0.94 

0.86 

56581 

0.83 

0.88 

0.86 

284 

Median 

0.78 

0.93 

0.85 

19758 

0.85 

0.93 

0.88 

124 

Minimum 

0.51 

0.87 

0.76 

79 

0.25 

0.44 

0.60 

23 

Maximum 

0.97 

1.00 

0.99 

202070 

1.00 

1.00 

1.00 

4811 

Count 

10 

10 

10 

10 

94 

94 

94 

89 

Tech  Year 


Mammography 

1988 

0.75 

0.94 

0.85 

44718 

Mammography 

1992 

0.91 

0.96 

0.94 

72706 

Mammography 

1995 

0.84 

0.93 

0.88 

201937 

Mammography 

1996 

0.79 

0.90 

0.85 

79 

Mammography 

2000 

0.78 

0.92 

0.85 

202070 

US 

1991 

0.77 

0.89 

0.83 

2425 

US 

1995 

0.51 

1.00 

0.76 

25046 

US 

1999 

0.73 

0.87 

0.80 

103 

US 

1999 

0.97 

1.00 

0.99 

2257 

US 

2000 

0.81 

0.99 

0.90 

14469 

CT 

1987 

0.80 

0.94 

0.87 

135 

CT 

1992 

1.00 

0.51 

0.76 

2166 

CT 

1994 

0.53 

0.86 

0.70 

33 

CT 

1994 

0.73 

0.94 

0.84 

60 

CT 

1996 

0.95 

0.44 

0.70 

710 

CT 

1997 

0.98 

0.98 

0.98 

100 

CT 

1997 

0.25 

0.95 

0.60 

303 

CT 

1997 

0.93 

1.00 

0.97 

303 

CT 

1997 

0.88 

0.95 

0.92 

303 

CT 

1998 

0.91 

1.00 

0.96 

52 

CT 

1998 

0.96 

1.00 

0.98 

106 

CT 

1998 

0.98 

0.98 

0.98 

105 

CT 

1999 

0.88 

1.00 

0.94 

99 

CT 

1999 

0.96 

0.99 

0.98 

300 

CT 

1999 

0.94 

0.94 

0.94 

139 

CT 

1999 

0.69 

0.96 

0.83 

115 

CT 

1999 

0.92 

0.89 

0.91 

213 

CT 

2000 

0.94 

1.00 

0.97 

65 

CT 

2000 

0.75 

0.93 

0.84 

200 

CT 

2001 

0.64 

0.85 

0.75 

786 

CT  no  contrast 

1999 

0.89 

0.57 

0.73 

195 

CT  with  contrast 

1999 

0.84 

0.94 

0.89 

199 

Mammography 

1990 

0.73 

0.80 

0.77 

1708 

Mammography 

1994 

0.70 

0.94 

0.82 

150 

Mammography 

1998 

0.70 

0.57 

0.64 

70 

Mammography 

1998 

0.81 

0.82 

0.82 

44 

Mammography 

1999 

0.88 

0.89 

0.88 

285 

Mammography 

1999 

0.83 

0.97 

0.90 

4811 

Mammography 

1999 

0.79 

0.83 

0.81 

559 

MR 

1987 

0.82 

0.99 

0.91 

135 

MR 

1989 

0.67 

0.86 

0.77 

23 

MR 

1989 

0.83 

0.84 

0.84 

47 

MR 

1990 

0.88 

0.94 

0.91 

MR 

1990 

0.88 

1.00 

0.94 

MR 

1990 

0.93 

0.95 

0.94 

MR 

1990 

0.91 

0.92 

0.92 

MR 

1994 

0.89 

0.82 

0.86 

70 

MR 

1994 

0.77 

1.00 

0.89 

26 

MR 

1995 

0.76 

0.60 

0.68 

31 

MR 

1996 

0.99 

0.56 

0.78 

57 

MR 

1997 

0.58 

0.91 

0.75 

303 

MR 

1997 

0.93 

1.00 

0.97 

303 

MR 

1997 

1.00 

1.00 

1.00 

303 

MR 

1998 

0.91 

0.79 

0.85 

72 

MR 

1999 

0.98 

0.88 

0.93 

280 

MR 

1999 

0.99 

0.91 

0.95 

285 

MR 

1999 

0.79 

0.81 

0.80 

38 

MR 

1999 

0.84 

1.00 

0.92 

30 

MR 

2000 

0.84 

0.97 

0.91 

124 

MR 

2000 

0.67 

0.93 

0.80 

220 

MR 

2001 

0.96 

0.75 

0.86 

49 

PLAIN  FILM 

1993 

0.79 

0.83 

0.81 

60 
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Medical  Studies 


First  Author 

Hill-foreign  bodies  in  human  tissue 
Orlinsky-radiolucent  foreign  body 


Analog  Accuracy  Studies  1 

Screening 

Diagnostic  1 

Se 

Sp 

Total  % 

Cases 

Se 

.  Sp 

Total  % 

Cases 

Mean 

0.81 

0.73 

0.77 

92 

Median 

0.81 

0.73 

0.77 

92 

Minimum 

0.79 

0.59 

0.71 

80 

Maximum 

0.83 

0.86 

0.83 

104 

Count 

0 

0 

0 

0 

2 

2 

2 

2 

Tech  Year 

US  1997  0.83 

US  2000  0.79 


0.59  0.71  80 

0.86  0.83  104 


Medical  Studies 


Field  Agreement  Studies  1 

Screening 

Diagnostic 

Agree  % 

Kappa 

Cases 

Agree  % 

Kappa 

Cases 

Mean 

0.81 

0.56 

150 

Median 

0.80 

0.60 

138 

Minimum 

0.77 

0.34 

41 

Maximum 

0.85 

0.72 

308 

Count 

0 

0 

0 

5 

13 

14 

First  Author 

Tech 

Year 

Freed-ureteral  stone  disease 

CT 

1998 

0.69 

103 

Grotta-stroke 

CT 

1999 

0.77 

0.39 

70 

Weishaupt  -osteoarthritis 

CT 

1999 

0.60 

308 

Kurtz-ovarian  cancer 

CT 

1999 

0.65 

213 

Elmore-breast  cancer 

Mammography 

1994 

0.78 

0.47 

150 

Baker-breast  cancer 

Mammography 

1996 

0.34 

60 

Brant-Zawadzk-lumbar  disc  abnormalities 

MR 

1995 

0.80 

0.58 

125 

Mussurakis-breast  cancer 

MR 

1996 

0.42 

57 

Weishaupt-osteoarthritis 

MR 

1999 

0.41 

308 

Kurtz-ovarian  cancer 

MR 

1999 

0.70 

179 

Masdeu-head  trauma 

SPECT 

1994 

0.83 

0.72 

41 

Wong-early  pregnancy  complications 

US 

1998 

0.85 

151 

Kurtz-ovarian  cancer 

US 

1999 

0.66 

264 

Ihlberg-vein  grafts 

US 

2000 

0.69 

69 

33 


Psychology,  Studies 


Field  Accuracy  Studies  1 

Screening 

Diagnostic  i 

Se 

. Sp 

Total  % 

Cases 

Se 

Sp 

Total  % 

Cases 

Mean 

0.74 

0.78 

0.76 

996 

0.72 

0.67 

0.70 

218 

Median 

0.79 

0.85 

0.78 

307 

0.71 

0.65 

0.69 

84 

Minimum 

0.11 

0.00 

0.42 

55 

0.37 

0.41 

0.50 

29 

Maximum 

1.00 

1.00 

0.98 

16235 

0.96 

0.95 

0.93 

1079 

Count 

36 

36 

36 

36 

51 

51 

51 

51 

First  Author 

Bradley-alcohol  screening 
Bradley-alcohol  screening 
Brooks-neuropsychological  screening 
Glascoe-developmental  screening 
Steer-major  depression 
Bradley-alcohol  screening 
Bradley-alcohol  screening 
Dent-memory  problems  in  multiple  sclerosis 
Bradley-alcohol  screening 
Bradley-alcohol  screening 
Parikh-post-stroke  depression 
Baird-autism  at  18  months  of  age 
Chen-attention-deficit  hyperactivity 
Scheinberg-eating  disorders 
Scheinberg-eating  disorders 
Gureje 

Pomeroy-depression 

Razavi-adjustment  and  major  depressive  disorders 

Inwald-performance  of  government  security  personnel 

Johnson-pathological  gamblers 

Benussi-alcoholism 

Yersin-alcoholism 

Erford-Math  Essential  skills  screen 

Uhlmann-dementia 

Inwald-performance  of  government  security  personnel 

Colligan-alcoholism 

Colligan-alcoholism 

Colligan-alcoholism 

Hirschfeld-bipolar  spectrum  disorder 

Sherman-Pediatric  Language  Acquisition 

Hiatt-job  performance  problems 

de  las  Cuevas-Severity  of  Dependence  Scale  (SDS) 

Birtchnell-depressive  disorders 

Bradley-alcohol  screening 

Bradley-alcohol  screening 

Bradley-alcohol  screening 

Berument-Autism 

Kogan-Geriatric  Depression  Scale 

Laprise-  Geriatric  Depression 

Blais-FRANTIC  AVOIDANCE 

Blais-UNSTABLE  RELATIONSHIPS 

Blais-IDENTITY  DISTURBANCE 

Blais-IMPULSIVITY 

Blais-SUICIDAL 

Blais-AFFECTIVE  INSTABILITY 
Blais-CHRONIC  EMPTINESS 
Blais-POORLY  CONTROLLED  ANGER 
Blais-STRESS-RELATED  PARANOIA 
Kogan-Geriatric  Depression  Scale 
Laprise-Geriatric  Depression 
Merson-personality  disorders 

Ivnick-MAYO  VERBAL  COMPREHENSION  FACTOR  S 
Chaffee-expressive  and  receptive  language  scales 
Ivnick-  ATTENTION-CONCENTRATION  SCORE 
Ivnick-LEARNING  FACTOR  SCORE 
Ivnick-PERCEPTUAL  ORGANIZATION  SCORE 
Ivnick-  RETENTION  SCORE 
BOONE  1994-depression 
WETZLER  1998-depression 
BEN-PORATH  1991 -depression 
BEN-PORATH  1991 -depression 
MUNLEY  1997-depression 
GREENBLATT  1999-depression 


Year 


1998 

0.80 

0.86 

0.83 

1998 

0.78 

0.89 

0.83 

1990 

0.80 

1.00 

0.90 

1993 

0.72 

0.76 

0.74 

1999 

0.97 

0.99 

0.98 

1998 

0.52 

0.85 

0.69 

1998 

0.35 

0.98 

0.66 

2000 

0.93 

0.48 

0.71 

1998 

0.91 

0.77 

0.84 

1998 

0.75 

0.89 

0.82 

1988 

0.86 

0.90 

0.88 

2000 

0.38 

0.98 

0.68 

1994 

0.23 

1.00 

0.62 

1993 

0.93 

0.41 

0.67 

1993 

1.00 

0.38 

0.69 

1990 

0.68 

0.70 

0.69 

2001 

0.91 

0.65 

0.78 

1990 

0.70 

0.75 

0.73 

1991 

0.60 

0.76 

0.68 

1998 

1.00 

0.85 

0.93 

1982 

1.00 

0.94 

0.97 

1989 

0.70 

0.92 

0.81 

1998 

0.98 

0.88 

0.93 

1991 

0.81 

0.97 

0.89 

1991 

0.45 

0.73 

0.59 

1988 

0.74 

0.84 

0.79 

1988 

1.00 

0.00 

0.50 

1988 

0.62 

0.34 

0.48 

2000 

0.73 

0.90 

0.82 

1999 

0.11 

0.73 

0.42 

1988 

0.68 

0.73 

0.70 

2000 

0.98 

0.94 

0.96 

1989 

0.94 

0.86 

0.90 

1998 

0.48 

0.86 

0.67 

1998 

0.91 

0.72 

0.82 

1998 

0.82 

0.85 

0.83 

1999 

1994 

1998 

1999 
1999 
1999 
1999 
1999 
1999 
1999 
1999 

1999 
1994 
1998 
1994 

2000 

1990 
2000 
2000 
2000 
2000 
1994 

1998 

1991 
1991 
1997 

1999 


771 

771 

175 

89 

120 

771 

771 

61 

771 

771 

80 

16235 

122 

1112 

1112 

787 

87 

210 

307 

423 

104 

268 

100 

209 

307 

2144 

2144 

2144 

198 

84 

55 

100 

133 

771 

771 

771 


0.85 

0.75 

0.80 

200 

0.64 

0.73 

0.69 

59 

0.96 

0.46 

0.71 

66 

0.63 

0.95 

0.79 

76 

0.94 

0.91 

0.93 

76 

0.73 

0.84 

0.79 

76 

0.55 

0.70 

0.63 

76 

0.96 

0.41 

0.69 

76 

0.91 

0.42 

0.67 

76 

0.52 

0.79 

0.66 

76 

0.73 

0.53 

0.63 

76 

0.51 

0.60 

0.56 

76 

0.79 

0.69 

0.74 

59 

0.89 

0.56 

0.73 

66 

0.95 

0.50 

0.73 

29 

0.55 

0.85 

0.70 

1079 

0.88 

0.45 

0.67 

152 

0.71 

0.70 

0.71 

1079 

0.77 

0.84 

0.81 

1079 

0.70 

0.83 

0.77 

1079 

0.88 

0.80 

0.84 

1079 

0.61 

0.62 

0.62 

62 

0.64 

0.65 

0.65 

113 

0.66 

0.64 

0.65 

73 

0.63 

0.61 

0.62 

87 

0.71 

0.71 

0.71 

84 

0.54 

0.56 

0.55 

75 
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Psychology  Studies 


Field  Agreement  Studies  l 

Screening 

Diagnostic  1 

Agree  % 

Kappa 

Cases 

Agree  % 

Kappa 

Cases 

Mean 

0.61 

510 

0.88 

0.79 

174 

Median 

0.61 

510 

0.88 

0.79 

113 

Minimum 

0.61 

510 

0.88 

0.64 

76 

Maximum 

0.61 

510 

0.88 

0.91 

331 

Count 

0 

1 

1 

1 

6 

6 

First  Author 

Year 

Lavigne-  DSM-III--R  with  preschool  children 

1994 

0.61 

510 

Klin-autism 

2000 

0.88 

0.71 

131 

DSM-III  Phase  Two  Field  Trials 

1980 

0.72 

331 

DSM-MI  Phase  Two  Field  Trials 

1980 

0.64 

331 

Blais-NINE  SCALE  PERSONALITY  DISORDER 

1999 

0.85 

76 

Hogervors-Alzheimer's  disease 

2000 

0.90 

82 

Hilsenroth-Schizophrenia 

1998 

0.91 

95 
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